This document lists the exploratory data analysis, model build and analysis for Coasting on Couches, the term project for MGT 6203 Spring semester. Whilst the project document and slides/ presentation list a more human-readable version, this document combines some exposition with a lot of code to generate graphs, to put it simply.
The code and data is available on a GitHub repo.
geojson_read method works in most installations (and had been extensively tested on mac OS Monterey ), it has thrown up errors in some Windows installations. We were made aware of this only yesterday, and unfortunately do not have a resolution at the moment, mainly because this does not happen on our primary platform, mac OS Monterey. We could potentially issue a fix via our GitHub repo should there be a resolution in the future. The attached HTML document shows the generated choropleth maps.Please run the following chunk to ensure all the necessary libraries are installed/ present should you wish to execute the chunks at your end. (Idea taken from this blogpost)
Please execute install.packages("pls") in the console if it’s the first time you’re running it. There could be some additional installations, depending on your operating system.
We had downloaded data from InsideAirbnb.com to the data folder. This will be available on our GitHub repo.
The data is in four parts (each city has all five elements): 1. listing: These are the actual Airbnb listings. The columns are defined here. This is set of 74 variables pertaining to a specific listing. 2. review: List of reviews per row in the listing table. 3. calendar: Price of a particular listing on a particular date, along with min-max nights for hire 4. neighbourhoods: List of neighbourhoods screened in the city 5. map: GeoJson shapefile showing district boundaries.
The data may be accessed here or here ## 1.1 Raw
We will read all the data into dataset variables.
#Singapore
listing.sin <- read.csv("./data/SIN_listings.csv")
reviews.sin <- read.csv("./data/SIN_reviews.csv")
calendar.sin <- read.csv("./data/SIN_calendar.csv")
neighbourhoods.sin <- read.csv("./data/SIN_neighbourhoods.csv")
map.sin <- geojson_read("./data/SIN_neighbourhoods.geojson")
#Taipei
listing.tpe <- read.csv("./data/TPE_listings.csv")
reviews.tpe <- read.csv("./data/TPE_reviews.csv")
calendar.tpe <- read.csv("./data/TPE_calendar.csv")
neighbourhoods.tpe <- read.csv("./data/TPE_neighbourhoods.csv")
map.tpe <- geojson_read("./data/TPE_neighbourhoods.geojson")
#Tokyo
listing.nrt <- read.csv("./data/NRT_listings.csv")
reviews.nrt <- read.csv("./data/NRT_reviews.csv")
calendar.nrt <- read.csv("./data/NRT_calendar.csv")
neighbourhoods.nrt <- read.csv("./data/NRT_neighbourhoods.csv")
map.nrt <- geojson_read("./data/NRT_neighbourhoods.geojson")
#Hong Kong
listing.hkg <- read.csv("./data/HKG_listings.csv")
reviews.hkg <- read.csv("./data/HKG_reviews.csv")
calendar.hkg <- read.csv("./data/HKG_calendar.csv")
neighbourhoods.hkg <- read.csv("./data/HKG_neighbourhoods.csv")
map.hkg <- geojson_read("./data/HKG_neighbourhoods.geojson")
We first see the number of listings per city.
cities <- c("Singapore", "Tokyo", "Taipei", "Hong Kong")
no_of_listings <- c(nrow(listing.sin), nrow(listing.nrt), nrow(listing.tpe), nrow(listing.hkg))
no_of_listings.fig <- plot_ly(
x = cities,
y = no_of_listings,
type = "bar",
text = no_of_listings
)
no_of_listings.fig <- no_of_listings.fig %>% layout(title ="No of Listings Per City", yaxis = list(title="No of Listings"))
no_of_listings.fig
Clearly, Tokyo has the largest number of listings followed by Hong Kong, Taipei and then Singapore.
Let us also consider heatmaps of where the listings are in each city by neighbourhood. Admittedly, the InsideAirbnb website has a map for each city (here’s one for Taipei), but this does not show by district. At a later stage, this can be enhanced to see by variable or time-series.
generate_choropleth_by_city <- function (listing, map, city_name)
{
listings_by_neighbourhood <- listing %>%
count(neighbourhood_cleansed)
# neighbourhoods_zero <- neighbourhoods %>%
# filter(!neighbourhood %in% listings_by_neighbourhood$neighbourhood) %>%
# mutate(n = 0) %>%
# select(neighbourhood, n)
# listings_by_neighbourhood <- union(listings_by_neighbourhood, neighbourhoods_zero)
# print(listings_by_neighbourhood)
g <- list (
fitbounds = "locations",
visible = FALSE
)
fig <- plot_ly()
fig <- fig %>% add_trace(
type="choropleth",
geojson=map,
locations=listings_by_neighbourhood$neighbourhood_cleansed,
z=listings_by_neighbourhood$n,
colorscale="Viridis",
featureidkey="properties.neighbourhood"
)
fig <- fig %>% layout(
geo = g
)
fig <- fig %>% colorbar(title = "No of listings")
fig <- fig %>% layout(
title = paste0("Listings by Neighbourhood - ", city_name)
)
fig
}
generate_choropleth_by_city(listing.sin, map.sin, "Singapore")
generate_choropleth_by_city(listing.nrt, map.nrt, "Tokyo")
generate_choropleth_by_city(listing.hkg, map.hkg, "Hong Kong")
generate_choropleth_by_city(listing.tpe, map.tpe, "Taipei")
bin_districts <- function(listing, bins)
{
district_bins <- listing %>%
count(neighbourhood_cleansed) %>%
arrange(desc(n))%>%
mutate(nb_group = ntile(n,n=bins)) %>%
arrange(desc(nb_group))
return(district_bins)
}
bin_districts(listing.sin, 4)
## neighbourhood_cleansed n nb_group
## 1 Kallang 405 4
## 2 Geylang 340 4
## 3 Downtown Core 317 4
## 4 Outram 309 4
## 5 Rochor 270 4
## 6 Novena 235 4
## 7 Bedok 186 4
## 8 Bukit Merah 178 4
## 9 River Valley 149 4
## 10 Queenstown 143 4
## 11 Singapore River 108 4
## 12 Tanglin 90 3
## 13 Orchard 80 3
## 14 Clementi 72 3
## 15 Jurong East 71 3
## 16 Jurong West 59 3
## 17 Marine Parade 57 3
## 18 Newton 57 3
## 19 Bukit Timah 56 3
## 20 Woodlands 44 3
## 21 Hougang 43 3
## 22 Toa Payoh 40 3
## 23 Bishan 39 2
## 24 Serangoon 35 2
## 25 Pasir Ris 33 2
## 26 Bukit Batok 31 2
## 27 Tampines 30 2
## 28 Ang Mo Kio 26 2
## 29 Sembawang 26 2
## 30 Sengkang 19 2
## 31 Yishun 19 2
## 32 Southern Islands 18 2
## 33 Punggol 17 2
## 34 Choa Chu Kang 17 1
## 35 Museum 15 1
## 36 Bukit Panjang 14 1
## 37 Central Water Catchment 12 1
## 38 Marina South 3 1
## 39 Western Water Catchment 3 1
## 40 Mandai 2 1
## 41 Lim Chu Kang 1 1
## 42 Pioneer 1 1
## 43 Sungei Kadut 1 1
## 44 Tuas 1 1
We could also see this as barcharts.
bar_charts_by_neighbourhood <- function (listing, city_name, neighbourhoods)
{
listings_by_neighbourhood <- listing %>%
count(neighbourhood_cleansed) %>%
# rename(neighbourhood = neighbourhood_cleansed)
arrange(desc(n))
# print(listings_by_neighbourhood)
neighbourhoods_zero <- neighbourhoods %>%
filter(!neighbourhood %in% listings_by_neighbourhood$neighbourhood_cleansed) %>%
rename(neighbourhood_cleansed = neighbourhood) %>%
mutate(n = 0) %>%
select(neighbourhood_cleansed, n)
print(neighbourhoods_zero)
listings_by_neighbourhood <- union(listings_by_neighbourhood, neighbourhoods_zero)
# print(listings_by_neighbourhood)
fig<- plot_ly(y=listings_by_neighbourhood$neighbourhood_cleansed, x=listings_by_neighbourhood$n, type="bar", orientation="h") %>%
layout(yaxis=list(categoryorder = "total ascending"), title=paste("Listings per neighbourhood in", city_name))
fig
}
bar_charts_by_neighbourhood(listing.sin, "Singapore", neighbourhoods.sin)
## neighbourhood_cleansed n
## 1 Marina East 0
## 2 Straits View 0
## 3 Changi 0
## 4 Changi Bay 0
## 5 Paya Lebar 0
## 6 North-Eastern Islands 0
## 7 Seletar 0
## 8 Simpang 0
## 9 Boon Lay 0
## 10 Tengah 0
## 11 Western Islands 0
bar_charts_by_neighbourhood(listing.nrt, "Tokyo", neighbourhoods.nrt)
## neighbourhood_cleansed n
## 1 Aogashima Mura 0
## 2 Fussa Shi 0
## 3 Hachijo Machi 0
## 4 Higashiyamato Shi 0
## 5 Hinode Machi 0
## 6 Hinohara Mura 0
## 7 Inagi Shi 0
## 8 Kiyose Shi 0
## 9 Kozushima Mura 0
## 10 Mikurajima Mura 0
## 11 Miyake Mura 0
## 12 Mizuho Machi 0
## 13 Niijima Mura 0
## 14 Ogasawara Mura 0
## 15 Oshima Machi 0
## 16 Toshima Mura 0
bar_charts_by_neighbourhood(listing.hkg, "Hong Kong", neighbourhoods.hkg)
## [1] neighbourhood_cleansed n
## <0 rows> (or 0-length row.names)
bar_charts_by_neighbourhood(listing.tpe, "Taipei", neighbourhoods.tpe)
## [1] neighbourhood_cleansed n
## <0 rows> (or 0-length row.names)
The majority of listings in Taipei, Tokyo and Hong Kong are in tourist-heavy districts. While the top district in Tokyo is Shinjuku, that in Hong Kong is Yau Tsim Mong, Kowloon’s core urban area formed by the combination of Yau Ma Tei, Tsim Sha Tsui and Mong Kok. Taipei’s top district is Zhongzheng district (“中正區”), consisting of historic sites and cultural performances.
In contrast to the other three cities, the top district in Singapore is the high-end residential condo district, Kallang. Not tourist heavy, but close to the downtown’s many attractions. The tourist-heavy Geylang, Downtown Core and Outram districts appear after Kallang.
Taipei and Hong Kong have listings in all districts. But 11 districts in Singapore (mostly military installations or the airport) and 16 districts in Tokyo do not have any listings.
There’s a column called amenities in the dataset that appears to list all the self-reported amenities in the listing as a single comma-separated list. Let’s try to see this further.
For instance, here’s the longest list of amenities among Singapore listings.
listing_amenities.sin <- listing.sin %>%
mutate(amenities = str_replace(amenities, "\\[","")) %>%
mutate(amenities = str_replace(amenities, "\\]","")) %>%
mutate(amenities = str_replace_all(amenities, "\"","")) %>%
mutate(amenities = str_replace_all(amenities, ", " ,",")) %>%
mutate(amenities_list = as.list(strsplit(amenities, ","))) %>%
mutate(no_of_am = lengths(amenities_list)) %>%
mutate(Wifi = as.numeric(grepl('Wifi', amenities, fixed = TRUE))) %>%
mutate(Shampoo = as.numeric(grepl('Shampoo', amenities, fixed = TRUE))) %>%
mutate(Kitchen = as.numeric(grepl('Kitchen', amenities, fixed = TRUE)))
# listing_amenities.sin %>% select(amenities, Wifi, Shampoo, Kitchen, Patio)
max_amenities.sin <- listing_amenities.sin %>%
select(amenities, no_of_am) %>%
group_by() %>%
slice(which.max(no_of_am))
amenities_list_string <- as.list(strsplit(as.character(max_amenities.sin["amenities"]), ","))
amenities_list_string
## [[1]]
## [1] "Toaster" "Sound system"
## [3] "Safe" "Indoor fireplace"
## [5] "Backyard" "Hangers"
## [7] "Bed linens" "Hot water kettle"
## [9] "Freezer" "Coffee maker"
## [11] "Washer" "Cooking basics"
## [13] "Bathtub" "Hair dryer"
## [15] "Clothing storage" "Oven"
## [17] "Outdoor furniture" "Paid parking on premises"
## [19] "High chair" "Children\\u2019s books and toys"
## [21] "Dedicated workspace" "Crib"
## [23] "Dining table" "Free parking on premises"
## [25] "Pool" "Cleaning products"
## [27] "Wine glasses" "Cleaning before checkout"
## [29] "Game console" "Long term stays allowed"
## [31] "Drying rack for clothing" "Outdoor dining area"
## [33] "Private entrance" "Elevator"
## [35] "Patio or balcony" "Refrigerator"
## [37] "Dryer" "Microwave"
## [39] "Baby bath" "Gym"
## [41] "Wifi" "Children\\u2019s dinnerware"
## [43] "Smoke alarm" "Board games"
## [45] "Luggage dropoff allowed" "Shampoo"
## [47] "Breakfast" "Extra pillows and blankets"
## [49] "Heating" "Conditioner"
## [51] "Cable TV" "Hot tub"
## [53] "Hot water" "Stove"
## [55] "Body soap" "BBQ grill"
## [57] "Iron" "Essentials"
## [59] "Babysitter recommendations" "Pack \\u2019n play/Travel crib"
## [61] "Kitchen" "Changing table"
## [63] "First aid kit" "Dishes and silverware"
## [65] "Air conditioning" "Shower gel"
## [67] "TV with standard cable" "Fire extinguisher"
#"Shampoo,Kitchen,Long term stays allowed,Washer,Smart lock,Hair dryer,Dryer,Wifi,Hot water,TV,Air conditioning,Smoke alarm,Fire extinguisher"
Apropos nothing, we will use the following amenities as dummy variables for price: > “Shampoo,Kitchen,Long term stays allowed,Washer,Hair dryer,Wifi,Hot water,TV,Air conditioning”
Similarly, let us also further analyse the column host_verifications to see if we can generate dummy variables from there as well.
listing_host_verf.sin <- listing.sin %>%
mutate(host_verifications = str_replace(host_verifications, "\\[","")) %>%
mutate(host_verifications = str_replace(host_verifications, "\\]","")) %>%
mutate(host_verifications = str_replace_all(host_verifications, "\"","")) %>%
mutate(host_verifications = str_replace_all(host_verifications, ", " ,",")) %>%
mutate(host_verifications_list = as.list(strsplit(host_verifications, ","))) %>%
mutate(no_of_vf = lengths(host_verifications_list))
max_verf.sin <- listing_host_verf.sin %>%
select(host_verifications, no_of_vf) %>%
group_by() %>%
slice(which.max(no_of_vf))
host_verf_list_string <- as.list(strsplit(as.character(max_verf.sin["host_verifications"]), ","))
host_verf_list_string
## [[1]]
## [1] "'email'" "'phone'"
## [3] "'facebook'" "'google'"
## [5] "'reviews'" "'jumio'"
## [7] "'offline_government_id'" "'selfie'"
## [9] "'government_id'" "'identity_manual'"
## [11] "'work_email'"
Let’s take this list to generate dummy variables. > [‘email’, ‘phone’, ‘facebook’, ‘reviews’, ‘manual_offline’, ‘jumio’, ‘offline_government_id’, ‘government_id’, ‘work_email’]
Let’s generalise these two bits for all cities and create dummy variables for each one of them.
wrangle_amenities_hostvf <- function (listing)
{
listing <- listing %>%
mutate(amenities = str_replace(amenities, "\\[","")) %>%
mutate(amenities = str_replace(amenities, "\\]","")) %>%
mutate(amenities = str_replace_all(amenities, "\"","")) %>%
mutate(amenities = str_replace_all(amenities, ", " ,",")) %>%
mutate(amenities_list = as.list(strsplit(amenities, ","))) %>%
mutate(no_of_am = lengths(amenities_list)) %>%
mutate(Amenities_Wifi = as.numeric(grepl('Wifi', amenities, fixed = TRUE))) %>%
mutate(Amenities_Shampoo = as.numeric(grepl('Shampoo', amenities, fixed = TRUE))) %>%
mutate(Amenities_Kitchen = as.numeric(grepl('Kitchen', amenities, fixed = TRUE))) %>%
mutate(Amenities_Long_Term = as.numeric(grepl('Long term stays', amenities, fixed = TRUE))) %>%
mutate(Amenities_Washer = as.numeric(grepl('Washer', amenities, fixed = TRUE))) %>%
mutate(Amenities_HairDryer = as.numeric(grepl('Hair dryer', amenities, fixed = TRUE))) %>%
mutate(Amenities_HotWater = as.numeric(grepl('Hot water', amenities, fixed = TRUE))) %>%
mutate(Amenities_TV = as.numeric(grepl('TV', amenities, fixed = TRUE))) %>%
mutate(Amenities_AC = as.numeric(grepl('Air conditioning', amenities, fixed = TRUE))) %>%
mutate(host_verifications = str_replace(host_verifications, "\\[","")) %>%
mutate(host_verifications = str_replace(host_verifications, "\\]","")) %>%
mutate(host_verifications = str_replace_all(host_verifications, "\"","")) %>%
mutate(host_verifications = str_replace_all(host_verifications, ", " ,",")) %>%
mutate(host_verifications_list = as.list(strsplit(host_verifications, ","))) %>%
mutate(hv_email = as.numeric(grepl('email', host_verifications, fixed = TRUE))) %>%
mutate(hv_phone = as.numeric(grepl('phone', host_verifications, fixed = TRUE))) %>%
mutate(hv_facebook = as.numeric(grepl('facebook', host_verifications, fixed = TRUE))) %>%
mutate(hv_reviews = as.numeric(grepl('reviews', host_verifications, fixed = TRUE))) %>%
mutate(hv_manual_offline = as.numeric(grepl('manual_offline', host_verifications, fixed = TRUE))) %>%
mutate(hv_manual_jumio = as.numeric(grepl('jumio', host_verifications, fixed = TRUE))) %>%
mutate(hv_manual_off_gov = as.numeric(grepl('offline_government_id', host_verifications, fixed = TRUE))) %>%
mutate(hv_manual_gov = as.numeric(grepl('government_id', host_verifications, fixed = TRUE))) %>%
mutate(hv_manual_work_email = as.numeric(grepl('work_email', host_verifications, fixed = TRUE))) %>%
mutate(no_of_vf = lengths(host_verifications_list))
}
listing.sin <- wrangle_amenities_hostvf(listing.sin)
listing.nrt <- wrangle_amenities_hostvf(listing.nrt)
listing.tpe <- wrangle_amenities_hostvf(listing.tpe)
listing.hkg <- wrangle_amenities_hostvf(listing.hkg)
This is a function that wrangles AirBnb data into an analysable chunk. Because we will be doing the same for multiple cities, we will do a function out of this. The function is based on top of code shared in the lecture for Module 2. The obvious additions are the id column, neighbourhoods and dummy variables for amenities and host verification.
wrangle_airbnb_dataset <- function (raw_listing_full)
{
listing.raw <- raw_listing_full %>%
select(id, price,number_of_reviews,beds,bathrooms,accommodates,reviews_per_month, property_type, room_type, review_scores_rating, neighbourhood_cleansed, host_response_time, host_response_rate, host_acceptance_rate, host_is_superhost, latitude, longitude, amenities, last_review, no_of_am, Amenities_Wifi, Amenities_Shampoo, Amenities_Kitchen, Amenities_Long_Term, Amenities_Washer, Amenities_HairDryer, Amenities_HotWater, Amenities_TV,Amenities_AC, host_verifications, hv_email,hv_phone, hv_facebook, hv_reviews, hv_manual_offline, hv_manual_jumio,hv_manual_off_gov, hv_manual_gov, hv_manual_work_email, no_of_vf) %>%
rename(Reviews = number_of_reviews) %>%
rename(Beds = beds) %>%
rename(Baths = bathrooms) %>%
rename(Capacity = accommodates) %>%
rename(Monthly_Reviews = reviews_per_month) %>%
rename(Property_Type = property_type) %>%
rename(Room_Type = room_type) %>%
rename(Price = price) %>%
rename(Rating = review_scores_rating) %>%
# rename(Neighbourhood = neighbourhood_cleansed) %>%
rename(host_Superhost = host_is_superhost)
listing.raw <- listing.raw %>%
mutate(Price = str_replace(Price, "[$]", "")) %>%
mutate(Price = str_replace(Price, "[,]", "")) %>%
mutate(Price = as.numeric(Price)) %>%
# mutate(hood_factor = as.factor(Neighbourhood)) %>%
mutate(host_response_rate = str_replace(host_response_rate, "[%]", "")) %>%
mutate(host_response_rate = as.numeric(host_response_rate)/100) %>%
mutate(host_acceptance_rate = str_replace(host_acceptance_rate, "[%]", "")) %>%
mutate(host_acceptance_rate = as.numeric(host_acceptance_rate)/100) %>%
mutate(host_Superhost = ifelse(host_Superhost =="f", 0, 1)) %>%
mutate(host_response_rate = factor(host_response_rate, levels = c("within a few hours", "within a day", "a few days or more"))) %>%
mutate(host_response_hours = ifelse(host_response_rate == "within a few hours"),1,0) %>%
mutate(host_response_day = ifelse(host_response_rate == "within a day"),1,0) %>%
mutate(host_response_few_days = ifelse(host_response_rate == "a few days or more"),1,0) %>%
mutate(last_review = as.Date(last_review)) %>%
mutate(Days_since_last_review = as.numeric(difftime(as.Date("2021-12-31"), last_review, units="days"))) %>%
mutate(Room_Type = factor(Room_Type, levels = c("Shared room", "Private room", "Entire home/apt"))) %>%
mutate(Capacity_Sqr = Capacity * Capacity) %>%
mutate(Beds_Sqr = Beds * Beds) %>%
mutate(Baths_Sqr = Baths * Baths) %>%
mutate(ln_Price = log(1+Price)) %>%
mutate(ln_Beds = log(1+Beds)) %>%
mutate(ln_Baths = log(1+Baths)) %>%
mutate(ln_Capacity = log(1+Capacity)) %>%
mutate(ln_Rating = log(1+Rating)) %>%
mutate(Shared_ind = ifelse(Room_Type == "Shared room",1,0)) %>%
mutate(House_ind = ifelse(Room_Type == "Entire home/apt",1,0)) %>%
mutate(Private_ind = ifelse(Room_Type == "Private room",1,0)) %>%
mutate(Capacity_x_Shared_ind = Shared_ind * Capacity) %>%
mutate(H_Cap = House_ind * Capacity) %>%
mutate(P_Cap = Private_ind * Capacity) %>%
mutate(ln_Capacity_x_Shared_ind = Shared_ind * ln_Capacity) %>%
mutate(ln_Capacity_x_House_ind = House_ind * ln_Capacity) %>%
mutate(ln_Capacity_x_Private_ind = Private_ind * ln_Capacity) %>%
filter(!is.na(Price))
return(listing.raw)
}
list.sin <- wrangle_airbnb_dataset(listing.sin)
list.nrt <- wrangle_airbnb_dataset(listing.nrt)
list.tpe <- wrangle_airbnb_dataset(listing.tpe)
list.hkg <- wrangle_airbnb_dataset(listing.hkg)
There’s value in understanding how many reviews a property has received in the last 12 months as a measure of how active a property is. The notion is that modelling price for active listings will be more accurate than modelling price for all listings.
The approach taken in this paper was to look listings active in the past 12 months. However, given restrictions because of pandemic, we felt it would be better to look at the period between 1 Jan 2019 and 31 Dec 2021, to include one year in addition to the two pandemic years, 2020 and 2021.
We check this by wrangling the review dataset.
count_reviews <- function(listings, reviews, from_date, to_date)
{
reviews_grouped <- reviews %>%
mutate(date = as.Date(date)) %>%
filter(between(date, as.Date(from_date), as.Date(to_date))) %>%
group_by(listing_id) %>%
summarise(reviews_since_2019 = n()) %>%
mutate(bookings_since_2019 = reviews_since_2019*2) %>%
rename(id = listing_id)
listings <- left_join(listings, reviews_grouped, by="id")
return(listings)
}
start_date = "2019-1-1"
end_date = "2021-12-31"
list.sin <-count_reviews(list.sin, reviews.sin,start_date, end_date)
list.hkg <- count_reviews(list.hkg, reviews.hkg,start_date, end_date)
list.tpe <- count_reviews(list.tpe, reviews.tpe,start_date, end_date)
list.nrt <- count_reviews(list.nrt, reviews.nrt,start_date, end_date)
list_after_2019.sin <- list.sin %>% filter(!is.na(reviews_since_2019))
list_after_2019.tpe <- list.tpe %>% filter(!is.na(reviews_since_2019))
list_after_2019.nrt <- list.nrt %>% filter(!is.na(reviews_since_2019))
list_after_2019.hkg <- list.hkg %>% filter(!is.na(reviews_since_2019))
Let’s try to look listings after 2019 in map form.
generate_choropleth_by_city(list_after_2019.sin, map.sin, "Singapore")
generate_choropleth_by_city(list_after_2019.nrt, map.nrt, "Tokyo")
generate_choropleth_by_city(list_after_2019.hkg, map.hkg, "Hong Kong")
generate_choropleth_by_city(list_after_2019.tpe, map.tpe, "Taipei")
bar_charts_by_neighbourhood(list_after_2019.sin, "Singapore", neighbourhoods.sin)
## neighbourhood_cleansed n
## 1 Marina East 0
## 2 Straits View 0
## 3 Changi 0
## 4 Changi Bay 0
## 5 Paya Lebar 0
## 6 North-Eastern Islands 0
## 7 Seletar 0
## 8 Simpang 0
## 9 Sungei Kadut 0
## 10 Boon Lay 0
## 11 Pioneer 0
## 12 Tengah 0
## 13 Tuas 0
## 14 Western Islands 0
## 15 Western Water Catchment 0
bar_charts_by_neighbourhood(list_after_2019.nrt, "Tokyo", neighbourhoods.nrt)
## neighbourhood_cleansed n
## 1 Aogashima Mura 0
## 2 Fussa Shi 0
## 3 Hachijo Machi 0
## 4 Higashiyamato Shi 0
## 5 Hinode Machi 0
## 6 Hinohara Mura 0
## 7 Inagi Shi 0
## 8 Kiyose Shi 0
## 9 Kozushima Mura 0
## 10 Mikurajima Mura 0
## 11 Miyake Mura 0
## 12 Mizuho Machi 0
## 13 Niijima Mura 0
## 14 Ogasawara Mura 0
## 15 Oshima Machi 0
## 16 Toshima Mura 0
bar_charts_by_neighbourhood(list_after_2019.hkg, "Hong Kong", neighbourhoods.hkg)
## [1] neighbourhood_cleansed n
## <0 rows> (or 0-length row.names)
bar_charts_by_neighbourhood(list_after_2019.tpe, "Taipei", neighbourhoods.tpe)
## [1] neighbourhood_cleansed n
## <0 rows> (or 0-length row.names)
add_earnings <- function(listing)
{
return (listing %>% mutate(earnings_since_2019 = bookings_since_2019 * 3 * Price))
}
list_after_2019.sin <- add_earnings(list_after_2019.sin)
list_after_2019.tpe <- add_earnings(list_after_2019.tpe)
list_after_2019.nrt <- add_earnings(list_after_2019.nrt)
list_after_2019.hkg <- add_earnings(list_after_2019.hkg)
Let’s group listings into groups of neighbourhoods: extremely popular, popular, moderate, not so popular, and sparse.
district_bins.sin <- bin_districts(list_after_2019.sin, bins=5)
district_bins.sin
## neighbourhood_cleansed n nb_group
## 1 Geylang 224 5
## 2 Kallang 221 5
## 3 Outram 201 5
## 4 Rochor 136 5
## 5 Downtown Core 134 5
## 6 Bedok 101 5
## 7 Bukit Merah 94 5
## 8 Novena 73 5
## 9 River Valley 67 4
## 10 Queenstown 52 4
## 11 Tanglin 38 4
## 12 Singapore River 36 4
## 13 Jurong West 32 4
## 14 Marine Parade 27 4
## 15 Jurong East 24 4
## 16 Orchard 24 4
## 17 Newton 22 3
## 18 Woodlands 22 3
## 19 Bukit Timah 21 3
## 20 Serangoon 21 3
## 21 Clementi 17 3
## 22 Bishan 16 3
## 23 Tampines 16 3
## 24 Hougang 15 3
## 25 Toa Payoh 13 2
## 26 Museum 9 2
## 27 Punggol 9 2
## 28 Ang Mo Kio 8 2
## 29 Yishun 8 2
## 30 Choa Chu Kang 7 2
## 31 Central Water Catchment 6 2
## 32 Pasir Ris 6 2
## 33 Bukit Batok 6 1
## 34 Bukit Panjang 6 1
## 35 Sengkang 5 1
## 36 Southern Islands 3 1
## 37 Marina South 2 1
## 38 Lim Chu Kang 1 1
## 39 Mandai 1 1
## 40 Sembawang 1 1
district_bins.nrt <- bin_districts(list_after_2019.nrt, bins=5)
district_bins.nrt
## neighbourhood_cleansed n nb_group
## 1 Shinjuku Ku 1653 5
## 2 Taito Ku 1147 5
## 3 Sumida Ku 810 5
## 4 Toshima Ku 707 5
## 5 Shibuya Ku 509 5
## 6 Ota Ku 357 5
## 7 Minato Ku 342 5
## 8 Chuo Ku 330 5
## 9 Nakano Ku 255 5
## 10 Setagaya Ku 251 4
## 11 Katsushika Ku 216 4
## 12 Kita Ku 181 4
## 13 Suginami Ku 177 4
## 14 Arakawa Ku 153 4
## 15 Shinagawa Ku 137 4
## 16 Koto Ku 136 4
## 17 Edogawa Ku 135 4
## 18 Itabashi Ku 110 4
## 19 Chiyoda Ku 110 3
## 20 Bunkyo Ku 106 3
## 21 Adachi Ku 73 3
## 22 Meguro Ku 47 3
## 23 Nerima Ku 44 3
## 24 Hachioji Shi 18 3
## 25 Hino Shi 15 3
## 26 Machida Shi 14 3
## 27 Chofu Shi 12 3
## 28 Fuchu Shi 11 2
## 29 Kokubunji Shi 10 2
## 30 Mitaka Shi 9 2
## 31 Akiruno Shi 7 2
## 32 Higashimurayama Shi 7 2
## 33 Kunitachi Shi 7 2
## 34 Musashino Shi 7 2
## 35 Tachikawa Shi 7 2
## 36 Tama Shi 7 2
## 37 Nishitokyo Shi 6 1
## 38 Kodaira Shi 5 1
## 39 Ome Shi 5 1
## 40 Komae Shi 4 1
## 41 Hamura Shi 3 1
## 42 Musashimurayama Shi 3 1
## 43 Okutama Machi 3 1
## 44 Akishima Shi 2 1
## 45 Higashikurume Shi 2 1
## 46 Koganei Shi 2 1
district_bins.hkg <- bin_districts(list_after_2019.hkg, bins=5)
district_bins.hkg
## neighbourhood_cleansed n nb_group
## 1 Yau Tsim Mong 1208 5
## 2 Wan Chai 311 5
## 3 Central & Western 279 5
## 4 Islands 211 4
## 5 Kowloon City 96 4
## 6 Eastern 70 4
## 7 Yuen Long 66 3
## 8 North 65 3
## 9 Sai Kung 47 3
## 10 Sham Shui Po 34 3
## 11 Sha Tin 24 2
## 12 Southern 24 2
## 13 Tai Po 19 2
## 14 Tuen Mun 12 2
## 15 Kwun Tong 9 1
## 16 Tsuen Wan 4 1
## 17 Kwai Tsing 3 1
## 18 Wong Tai Sin 3 1
district_bins.tpe <- bin_districts(list_after_2019.tpe, bins=5)
district_bins.tpe
## neighbourhood_cleansed n nb_group
## 1 萬華區 536 5
## 2 中正區 475 5
## 3 大安區 454 4
## 4 中山區 407 4
## 5 信義區 283 3
## 6 大同區 153 3
## 7 松山區 142 2
## 8 士林區 119 2
## 9 文山區 64 2
## 10 北投區 54 1
## 11 內湖區 49 1
## 12 南港區 24 1
list_after_2019.sin <- left_join(list_after_2019.sin, district_bins.sin %>% select(neighbourhood_cleansed, nb_group), by="neighbourhood_cleansed")
list_after_2019.nrt <- left_join(list_after_2019.nrt, district_bins.nrt %>% select(neighbourhood_cleansed, nb_group), by="neighbourhood_cleansed")
list_after_2019.hkg <- left_join(list_after_2019.hkg, district_bins.hkg %>% select(neighbourhood_cleansed, nb_group), by="neighbourhood_cleansed")
list_after_2019.tpe <- left_join(list_after_2019.tpe, district_bins.tpe %>% select(neighbourhood_cleansed, nb_group), by="neighbourhood_cleansed")
# list_after_2019.sin
list_after_2019.sin <- dummy_cols(list_after_2019.sin, select_columns = "nb_group", remove_selected_columns = TRUE)
list_after_2019.nrt <- dummy_cols(list_after_2019.nrt, select_columns = "nb_group", remove_selected_columns = TRUE)
list_after_2019.tpe <- dummy_cols(list_after_2019.tpe, select_columns = "nb_group", remove_selected_columns = TRUE)
list_after_2019.hkg <- dummy_cols(list_after_2019.hkg, select_columns = "nb_group", remove_selected_columns = TRUE)
list_after_2019.sin_remove <- dummy_cols(list_after_2019.sin, select_columns = c("Property_Type","Room_Type"), remove_selected_columns = TRUE)
list_after_2019.nrt_remove <- dummy_cols(list_after_2019.nrt, select_columns = c("Property_Type","Room_Type"), remove_selected_columns = TRUE)
list_after_2019.tpe_remove <- dummy_cols(list_after_2019.tpe, select_columns = c("Property_Type","Room_Type"), remove_selected_columns = TRUE)
list_after_2019.hkg_remove <- dummy_cols(list_after_2019.hkg, select_columns = c("Property_Type","Room_Type"), remove_selected_columns = TRUE)
# list_after_2019.sin
# list_after_2019.nrt
# list_after_2019.hkg
# list_after_2019.tpe
This reduces the number of listings, and hopefully, quite a few outliers.
cities <- c("Singapore", "Tokyo", "Taipei", "Hong Kong")
no_of_listings <- c(nrow(listing.sin), nrow(listing.nrt), nrow(listing.tpe), nrow(listing.hkg))
no_of_listings_after_2019 <- c(nrow(list_after_2019.sin), nrow(list_after_2019.nrt), nrow(list_after_2019.tpe), nrow(list_after_2019.hkg))
data <- data.frame(cities, no_of_listings, no_of_listings_after_2019)
no_of_listings.fig <- plot_ly(data,
x = cities,
y = ~no_of_listings,
type = "bar",
text = no_of_listings,
name = "No of Listings (All Years)"
)
no_of_listings.fig <- no_of_listings.fig %>% add_trace(y = ~no_of_listings_after_2019, text= no_of_listings_after_2019, name = "No of Active Listings")
no_of_listings.fig <- no_of_listings.fig %>% layout(title ="No of Listings Per City", yaxis = list(title="No of Listings"))
no_of_listings.fig
This roughly halves the number of listings being considered in Hong Kong and Singapore, but not in Taipei and Tokyo.
We now attempt to check on variables for each city.
data_exploration <- function (listing)
{
plot_str(listing, type="r")
introduce(listing)
plot_intro(listing)
plot_missing(listing)
plot_bar(listing)
pca_df <- na.omit(list.sin[, c("Price", "Room_Type", "Reviews", "Beds", "Capacity", "Monthly_Reviews", "host_Superhost", "Rating")])#,"Days_since_last_review", "host_response_rate", "host_response_hours", "host_acceptance_rate","host_response_day", "host_response_few_days")])
plot_qq(pca_df)
plot_prcomp(pca_df, variance_cap = 0.9, nrow = 2L, ncol=2L)
}
data_exploration(list.sin)
## 4 columns ignored with more than 50 categories.
## Property_Type: 51 categories
## amenities: 2815 categories
## last_review: 962 categories
## host_verifications: 140 categories
### 2.1.2 Taipei
data_exploration(list.tpe)
## 4 columns ignored with more than 50 categories.
## Property_Type: 58 categories
## amenities: 3376 categories
## last_review: 1050 categories
## host_verifications: 158 categories
data_exploration(list.hkg)
## 4 columns ignored with more than 50 categories.
## Property_Type: 69 categories
## amenities: 3846 categories
## last_review: 1125 categories
## host_verifications: 150 categories
data_exploration(list.nrt)
## 4 columns ignored with more than 50 categories.
## Property_Type: 64 categories
## amenities: 7467 categories
## last_review: 987 categories
## host_verifications: 192 categories
We will now check out outliers in our data for various parameters, filtering for listings that have seen at least one booking since 1 Jan 2019, starting with Singapore data.
generate_price_boxplot <- function (listing.clean, city, comparison_col = "")
{
# png(file = "./graphs/boxplot.png")
if (comparison_col == "")
{
boxplot(listing.clean$Price, data = listing.clean, ylab="Price", main=paste("Boxplot: Price for", city))
}
else
boxplot(listing.clean$Price ~ listing.clean[[comparison_col]], data = listing.clean, ylab="Price", xlab=comparison_col, main=paste("Boxplot: Price vs", comparison_col, "for", city))
# dev.off()
}
generate_price_boxplot(list_after_2019.sin, "Singapore") #, sin_listing.clean$)
generate_price_boxplot(list_after_2019.sin, "Singapore", "Room_Type") #, sin_listing.clean$)
generate_price_boxplot(list_after_2019.sin, "Singapore", "Property_Type") #, sin_listing.clean$)
generate_price_boxplot(list_after_2019.sin, "Singapore", "Capacity") #, sin_listing.clean$)
generate_price_boxplot(list_after_2019.sin, "Singapore", "Beds") #, sin_listing.clean$)
generate_price_boxplot(list_after_2019.sin, "Singapore", "neighbourhood_cleansed") #, sin_listing.clean$)
generate_price_boxplot(list_after_2019.sin, "Singapore", "Reviews") #, sin_listing.clean$)
Seems like a single boat in Bukit Merah area (possibly next to the marina at Keppel Bay) has a very high price, at $2500/ night. Let’s look that one up more closely.
# head(list_after_2019.sin %>% arrange(desc(Price)))
# head(list_after_2019.sin %>% arrange(desc(reviews_since_2019)))
head(list_after_2019.sin %>% filter(Property_Type == "Boat") %>% arrange (desc(reviews_since_2019)))
## id Price Reviews Beds Baths Capacity Monthly_Reviews Property_Type
## 1 31527262 344 217 1 NA 2 6.24 Boat
## 2 37907711 199 177 3 NA 4 6.23 Boat
## 3 20247516 2500 55 4 NA 5 1.05 Boat
## 4 50433019 288 7 2 NA 5 2.50 Boat
## Room_Type Rating neighbourhood_cleansed host_response_time
## 1 Entire home/apt 4.94 Southern Islands within an hour
## 2 Entire home/apt 4.49 Punggol within an hour
## 3 Entire home/apt 4.74 Bukit Merah within a day
## 4 Entire home/apt 5.00 Punggol within a few hours
## host_response_rate host_acceptance_rate host_Superhost latitude longitude
## 1 <NA> 1.00 1 1.24535 103.8387
## 2 <NA> 0.96 0 1.41585 103.9001
## 3 <NA> 0.98 0 1.26520 103.8190
## 4 <NA> 0.92 0 1.41480 103.8986
## amenities
## 1 Toaster,Sound system,Hangers,Bed linens,Hot water kettle,Coffee maker,Carbon monoxide alarm,Hair dryer,TV,Outdoor furniture,Dining table,Security cameras on property,Outdoor dining area,Private entrance,Refrigerator,Microwave,Waterfront,Wifi,Smoke alarm,Shampoo,Portable fans,Paid parking off premises,Hot water,Mini fridge,Essentials,First aid kit,Dishes and silverware,Air conditioning,Shower gel,Fire extinguisher
## 2 Sound system,Hangers,Bed linens,Coffee maker,Hair dryer,TV,Paid parking on premises,Lockbox,Room-darkening shades,Security cameras on property,Long term stays allowed,Patio or balcony,Pour-over coffee,Microwave,Waterfront,Wifi,Smoke alarm,Shampoo,Extra pillows and blankets,Hot water,Mini fridge,Essentials,Kitchen,EV charger,First aid kit,Air conditioning,Shower gel,Fire extinguisher
## 3 Shampoo,Essentials,Long term stays allowed,Carbon monoxide alarm,Hair dryer,Host greets you,Hot water,Pool,TV,Paid parking on premises,Air conditioning,Smoke alarm,Fire extinguisher
## 4 Toaster,Bidet,Sound system,Rice maker,Hangers,Bed linens,Hot water kettle,Cooking basics,Freezer,Washer,Bathtub,Hair dryer,TV,Outdoor furniture,Dining table,Dedicated workspace,Free parking on premises,Lockbox,Cleaning products,Clothing storage: closet,dresser,and wardrobe,Long term stays allowed,Outdoor dining area,Private entrance,Pocket wifi,Refrigerator,Waterfront,Wifi,Smoke alarm,Induction stove,Dishwasher,Extra pillows and blankets,Portable fans,Hot water,Mini fridge,Iron,Essentials,Boat slip,Kitchen,EV charger,First aid kit,Air conditioning,Dishes and silverware,Fire extinguisher
## last_review no_of_am Amenities_Wifi Amenities_Shampoo Amenities_Kitchen
## 1 2021-09-15 30 1 1 0
## 2 2021-12-12 28 1 1 1
## 3 2020-07-28 13 0 1 0
## 4 2021-12-07 45 1 0 1
## Amenities_Long_Term Amenities_Washer Amenities_HairDryer Amenities_HotWater
## 1 0 0 1 1
## 2 1 0 1 1
## 3 1 0 1 1
## 4 1 1 1 1
## Amenities_TV Amenities_AC
## 1 1 1
## 2 1 1
## 3 1 1
## 4 1 1
## host_verifications
## 1 'email','phone','jumio','offline_government_id','selfie','government_id','identity_manual'
## 2 'email','phone','jumio','offline_government_id','selfie','government_id','identity_manual'
## 3 'email','phone','reviews','jumio','selfie','government_id','identity_manual'
## 4 'phone'
## hv_email hv_phone hv_facebook hv_reviews hv_manual_offline hv_manual_jumio
## 1 1 1 0 0 0 1
## 2 1 1 0 0 0 1
## 3 1 1 0 1 0 1
## 4 0 1 0 0 0 0
## hv_manual_off_gov hv_manual_gov hv_manual_work_email no_of_vf
## 1 1 1 0 7
## 2 1 1 0 7
## 3 0 1 0 7
## 4 0 0 0 1
## host_response_hours 1 0 host_response_day host_response_few_days
## 1 NA 1 0 NA NA
## 2 NA 1 0 NA NA
## 3 NA 1 0 NA NA
## 4 NA 1 0 NA NA
## Days_since_last_review Capacity_Sqr Beds_Sqr Baths_Sqr ln_Price ln_Beds
## 1 107 4 1 NA 5.843544 0.6931472
## 2 19 16 9 NA 5.298317 1.3862944
## 3 521 25 16 NA 7.824446 1.6094379
## 4 24 25 4 NA 5.666427 1.0986123
## ln_Baths ln_Capacity ln_Rating Shared_ind House_ind Private_ind
## 1 NA 1.098612 1.781709 0 1 0
## 2 NA 1.609438 1.702928 0 1 0
## 3 NA 1.791759 1.747459 0 1 0
## 4 NA 1.791759 1.791759 0 1 0
## Capacity_x_Shared_ind H_Cap P_Cap ln_Capacity_x_Shared_ind
## 1 0 2 0 0
## 2 0 4 0 0
## 3 0 5 0 0
## 4 0 5 0 0
## ln_Capacity_x_House_ind ln_Capacity_x_Private_ind reviews_since_2019
## 1 1.098612 0 217
## 2 1.609438 0 177
## 3 1.791759 0 26
## 4 1.791759 0 7
## bookings_since_2019 earnings_since_2019 nb_group_1 nb_group_2 nb_group_3
## 1 434 447888 1 0 0
## 2 354 211338 0 1 0
## 3 52 390000 0 0 0
## 4 14 12096 0 1 0
## nb_group_4 nb_group_5
## 1 0 0
## 2 0 0
## 3 0 1
## 4 0 0
head(list_after_2019.sin %>% group_by(id, Property_Type, bookings_since_2019) %>% summarise(percent_of_total = bookings_since_2019*100/sum(list_after_2019.sin$bookings_since_2019)) %>% filter(Property_Type == "Boat") %>% arrange (desc(bookings_since_2019)))
## `summarise()` has grouped output by 'id', 'Property_Type'. You can override
## using the `.groups` argument.
## # A tibble: 4 × 4
## # Groups: id, Property_Type [4]
## id Property_Type bookings_since_2019 percent_of_total
## <int> <chr> <dbl> <dbl>
## 1 31527262 Boat 434 0.967
## 2 37907711 Boat 354 0.788
## 3 20247516 Boat 52 0.116
## 4 50433019 Boat 14 0.0312
There are four boats listed on Airbnb Singapore. Together, they form roughly 2% of all bookings since 2019.
list_of_vars = c("earnings_since_2019","Rating", "Reviews", "Beds", "Capacity", "host_acceptance_rate", "host_Superhost","Amenities_Wifi","Amenities_Shampoo","Amenities_Kitchen","Amenities_Long_Term","Amenities_Washer","Amenities_HairDryer", "Amenities_HotWater", "Amenities_TV", "Amenities_AC", "hv_email", "hv_reviews", "Shared_ind", "House_ind", "Private_ind")#, "reviews_since_2019","bookings_since_2019") #, "hood_factor")
# list_after_2019.sin %>% select_(.dots = c(list_of_vars), "Price")
vars_list.sin = list_after_2019.sin %>% select_(.dots = c(list_of_vars,"Price")) %>% na.omit()
## Warning: `select_()` was deprecated in dplyr 0.7.0.
## Please use `select()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
vars_list.tpe = list_after_2019.tpe %>% select_(.dots = c(list_of_vars,"Price")) %>% na.omit()
vars_list.hkg = list_after_2019.hkg %>% select_(.dots = c(list_of_vars,"Price")) %>% na.omit()
vars_list.nrt = list_after_2019.nrt %>% select_(.dots = c(list_of_vars,"Price")) %>% na.omit()
# vars_list.sin
paint_corrleations <- function(listing)
{
# chart.Correlation(listing, histogram=TRUE, pch=19)
corrplot::corrplot(cor(listing, use = "complete.obs"), method="square", type="lower")
}
paint_corrleations(vars_list.sin)
paint_corrleations(vars_list.nrt)
paint_corrleations(vars_list.tpe)
paint_corrleations(vars_list.hkg)
Principal Components Regression could find M linear combinations (“principal components”) of our predictors (list_of_vars) and then use least squares to fit a linear regression model.
set.seed(1)
pcr_model <- function (listings, city)
{
pcr_model <- pcr( data=listings, scale=TRUE, validation="CV", Price ~ reviews_since_2019 + Rating + host_acceptance_rate +host_Superhost + reviews_since_2019 + Shared_ind + House_ind + Private_ind + Amenities_Wifi + hv_email)
summary(pcr_model)
plot(pcr_model)
validationplot(pcr_model, val.type="MSEP")
validationplot(pcr_model, val.type="R2")
print(paste("MAE for", city,":", mae(listings$Price, predict(pcr_model))))
return (pcr_model)
}
pcr_model.sin <-pcr_model(list_after_2019.sin, "Singapore")
## Data: X dimension: 1405 9
## Y dimension: 1405 1
## Fit method: svdpc
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 152.6 142.9 142.1 141.6 142.4 142.2 142.3
## adjCV 152.6 142.9 142.0 141.6 142.3 142.1 142.2
## 7 comps 8 comps 9 comps
## CV 140.8 140.7 140.6
## adjCV 140.7 140.6 140.5
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 22.14 40.73 53.25 64.19 74.57 84.27 92.71 100.00
## Price 12.89 13.94 14.45 14.45 15.23 15.23 16.90 17.21
## 9 comps
## X 100.00
## Price 17.21
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Singapore : 108.772298292697"
pcr_model.hkg <-pcr_model(list_after_2019.hkg, "Hong Kong")
## Data: X dimension: 1945 9
## Y dimension: 1945 1
## Fit method: svdpc
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 2455 2455 2455 2454 2455 2456 2457
## adjCV 2455 2455 2455 2454 2455 2456 2456
## 7 comps 8 comps 9 comps
## CV 2458 2455 2455
## adjCV 2457 2454 2455
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 21.7730 37.2315 50.3304 62.1254 72.9752 83.0920 91.9797 100.0000
## Price 0.2567 0.2572 0.4119 0.4395 0.4482 0.5174 0.5345 0.8255
## 9 comps
## X 100.0000
## Price 0.8256
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Hong Kong : 869.547093791851"
pcr_model.tpe <-pcr_model(list_after_2019.tpe, "Taipei")
## Data: X dimension: 2175 9
## Y dimension: 2175 1
## Fit method: svdpc
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 4229 4170 4170 4153 4154 4156 4148
## adjCV 4229 4170 4169 4153 4153 4156 4148
## 7 comps 8 comps 9 comps
## CV 4134 4131 4132
## adjCV 4133 4130 4131
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 24.408 39.493 52.740 64.281 74.915 84.836 93.121 100.000
## Price 2.953 3.029 3.792 3.795 3.795 4.154 4.932 5.071
## 9 comps
## X 100.000
## Price 5.078
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Taipei : 2209.48295156008"
pcr_model.nrt <-pcr_model(list_after_2019.nrt, "Tokyo")
## Data: X dimension: 7249 9
## Y dimension: 7249 1
## Fit method: svdpc
## Number of components considered: 9
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 29610 29516 29490 29496 29487 29489 29492
## adjCV 29610 29516 29489 29495 29486 29488 29491
## 7 comps 8 comps 9 comps
## CV 29492 29494 29495
## adjCV 29490 29492 29493
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps 7 comps 8 comps
## X 22.2622 37.4091 49.4255 61.1240 71.8952 82.0212 91.8955 100.0000
## Price 0.6487 0.8555 0.8623 0.9325 0.9325 0.9327 0.9626 0.9718
## 9 comps
## X 100.0000
## Price 0.9733
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Tokyo : 12041.061870605"
The mean absolute error for each city is about $108.
Price is a constant in the dataset and is in fact recommended by Airbnb itself. Instead, it makes more sense to model for earnings than price, as earnings is also dependent on number of bookings for a given price.
set.seed(1)
pcr_model_earnings <- function (listings, city)
{
pcr_model <- pcr( data=listings, scale=TRUE, validation="CV", earnings_since_2019 ~ Price + reviews_since_2019 + Rating + host_acceptance_rate +host_Superhost + reviews_since_2019 + Shared_ind + House_ind + Private_ind + Amenities_Wifi + hv_email) #",
# "host_Superhost", "no_of_am","Amenities_Wifi","Amenities_Shampoo","Amenities_Kitchen","Amenities_Long_Term","Amenities_Washer",
# "Amenities_HairDryer", "Amenities_HotWater", "Amenities_TV", "Amenities_AC", "hv_email", "hv_facebook", "hv_reviews",
# "hv_manual_offline", "hv_manual_jumio", "hv_manual_off_gov", "hv_manual_gov", "hv_manual_work_email", "no_of_vf", "Days_since_last_review",
# , "reviews_since_2019","bookings_since_2019")
summary(pcr_model)
plot(pcr_model)
validationplot(pcr_model, val.type="MSEP")
validationplot(pcr_model, val.type="R2")
print(paste("MAE for", city,":", mae(listings$earnings_since_2019, predict(pcr_model))))
return (pcr_model)
}
pcr_model.sin <-pcr_model_earnings(list_after_2019.sin, "Singapore")
## Data: X dimension: 1405 10
## Y dimension: 1405 1
## Fit method: svdpc
## Number of components considered: 10
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 31428 29557 28372 28288 25678 25378 22925
## adjCV 31428 29554 28366 28286 24823 25441 23114
## 7 comps 8 comps 9 comps 10 comps
## CV 17510 17465 16439 16436
## adjCV 17453 17414 16395 16391
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## X 22.14 38.99 50.4 60.24 69.93 78.66
## earnings_since_2019 11.70 18.99 19.6 37.12 41.06 52.10
## 7 comps 8 comps 9 comps 10 comps
## X 87.03 93.82 100.00 100.00
## earnings_since_2019 71.88 72.12 75.15 75.15
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Singapore : 20085.8644280392"
pcr_model.hkg <-pcr_model_earnings(list_after_2019.hkg, "Hong Kong")
## Data: X dimension: 1945 10
## Y dimension: 1945 1
## Fit method: svdpc
## Number of components considered: 10
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 410901 402205 390090 377536 381124 376353 341603
## adjCV 410901 402288 390081 377417 381617 367444 338156
## 7 comps 8 comps 9 comps 10 comps
## CV 346454 318750 295779 295967
## adjCV 343001 315592 292494 292664
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## X 19.648 33.56 45.45 56.10 66.09 75.83
## earnings_since_2019 3.538 10.11 15.39 15.39 48.30 49.76
## 7 comps 8 comps 9 comps 10 comps
## X 84.87 92.86 100.00 100.00
## earnings_since_2019 49.91 58.56 65.66 65.66
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Hong Kong : 200007.767176238"
pcr_model.tpe <-pcr_model_earnings(list_after_2019.tpe, "Taipei")
## Data: X dimension: 2175 10
## Y dimension: 2175 1
## Fit method: svdpc
## Number of components considered: 10
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 1181326 1104149 1103314 1071703 1072319 990895 976066
## adjCV 1181326 1104158 1103341 1071364 1073671 959962 972962
## 7 comps 8 comps 9 comps 10 comps
## CV 967203 780310 753804 752060
## adjCV 964362 776832 750544 748890
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## X 22.49 36.10 48.35 58.74 68.46 78.03
## earnings_since_2019 12.47 12.77 18.39 18.73 38.85 38.90
## 7 comps 8 comps 9 comps 10 comps
## X 86.66 93.84 100.00 100.00
## earnings_since_2019 42.44 62.81 65.52 65.52
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Taipei : 621874.28842328"
pcr_model.nrt <-pcr_model_earnings(list_after_2019.nrt, "Tokyo")
## Data: X dimension: 7249 10
## Y dimension: 7249 1
## Fit method: svdpc
## Number of components considered: 10
##
## VALIDATION: RMSEP
## Cross-validated using 10 random segments.
## (Intercept) 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## CV 7764047 7613827 7340841 7216238 7147811 5899329 5450582
## adjCV 7764047 7613976 7341342 7222899 7149043 5652462 5440735
## 7 comps 8 comps 9 comps 10 comps
## CV 4892522 4785782 4762564 4762764
## adjCV 4863554 4777938 4754861 4755034
##
## TRAINING: % variance explained
## 1 comps 2 comps 3 comps 4 comps 5 comps 6 comps
## X 20.165 33.87 44.69 55.31 65.04 74.74
## earnings_since_2019 3.815 10.69 14.11 15.84 50.72 53.89
## 7 comps 8 comps 9 comps 10 comps
## X 83.85 92.71 100.00 100.00
## earnings_since_2019 63.09 64.32 64.64 64.64
## Warning in actual - predicted: longer object length is not a multiple of shorter
## object length
## [1] "MAE for Tokyo : 3905051.77520906"
We have two ways of cleaning:
* **clean_subset_including**: To select the variables we want (I used this for stepwise since built-in stepwise regression function automatically creates dummy variables)
* **clean_subset_including**: To select the variables we don't want (I used this for Lasso since there is a lot of dummy variables and I rather exclude those not needed)
### For checking number of missing data
# md.pattern(list_after_2019.country)
clean_subset_including <- function(list_after_2019.country) {
### Arbitrary selection of a list of variables
selecting_columns <- list_after_2019.country[,c("Reviews","Beds","Capacity","Monthly_Reviews","Property_Type","Room_Type","Rating","neighbourhood_cleansed","host_response_time","host_acceptance_rate","host_Superhost","no_of_am","Amenities_Wifi","Amenities_Shampoo","Amenities_Kitchen","Amenities_Long_Term","Amenities_Washer","Amenities_HairDryer","Amenities_HotWater","Amenities_TV","Amenities_AC","hv_email","hv_phone","hv_facebook","hv_reviews","hv_manual_offline","hv_manual_jumio","hv_manual_off_gov","hv_manual_gov","hv_manual_work_email","no_of_vf","Days_since_last_review","Capacity_Sqr","Beds_Sqr","ln_Beds","ln_Capacity","ln_Rating","Shared_ind","House_ind","Private_ind","Capacity_x_Shared_ind","H_Cap","P_Cap","ln_Capacity_x_Shared_ind","ln_Capacity_x_House_ind","ln_Capacity_x_Private_ind","reviews_since_2019","bookings_since_2019", "earnings_since_2019","nb_group_1","nb_group_2","nb_group_3","nb_group_4","nb_group_5" )]
### Removing rows with blanks instead of imputing
selecting_columns <- na.omit(selecting_columns)
selecting_columns$Property_Type <- as.factor(selecting_columns$Property_Type)
selecting_columns$neighbourhood_cleansed <- as.factor(selecting_columns$neighbourhood_cleansed )
selecting_columns$host_response_time <- as.factor(selecting_columns$host_response_time)
return(selecting_columns)
}
###############################################################################################################
clean_subset_excluding <- function(list_after_2019.country) {
selecting_columns <- list_after_2019.country[,!names(list_after_2019.country) %in% c('id','Price','ln_Price','host_response_time','host_response_rate','host_verifications','Baths','Baths_Sqr','ln_Baths','latitude','longitude','neighbourhood_cleansed','amenities','last_review','1','0','host_response_hours','host_response_day','host_response_few_days')]
selecting_columns <- na.omit(selecting_columns)
return(selecting_columns)
}
###############################################################################################################
# For Stepwise Regression Input
list_after_2019.sin_step <- clean_subset_including(list_after_2019.sin)
list_after_2019.hkg_step <- clean_subset_including(list_after_2019.hkg)
list_after_2019.nrt_step <- clean_subset_including(list_after_2019.nrt)
list_after_2019.tpe_step <- clean_subset_including(list_after_2019.tpe)
# For Lasso Regression Input
list_after_2019.sin_clean <- clean_subset_excluding(list_after_2019.sin_remove)
list_after_2019.hkg_clean <- clean_subset_excluding(list_after_2019.hkg_remove)
list_after_2019.nrt_clean <- clean_subset_excluding(list_after_2019.nrt_remove)
list_after_2019.tpe_clean <- clean_subset_excluding(list_after_2019.tpe_remove)
The R-squared value here is 0.8
stepwise_regression_model <- function(list_after_2019.country_step) {
#Define Smallest and Full Model
minmod = lm(earnings_since_2019~1, data = list_after_2019.country_step)
fullmod = lm(earnings_since_2019~. , data = list_after_2019.country_step)
# Using BIC: k=log(nobs(fullmod), Using AIC: k=2
backward_regression_model <- step(fullmod, scope = list(lower = minmod, upper = fullmod),direction = "backward", k=log(nobs(fullmod)), trace=F)
return (backward_regression_model)
}
summary(stepwise_regression_model(list_after_2019.sin_step))
##
## Call:
## lm(formula = earnings_since_2019 ~ Reviews + Monthly_Reviews +
## Property_Type + host_Superhost + Amenities_Wifi + hv_manual_gov +
## no_of_vf + H_Cap + ln_Capacity_x_House_ind + reviews_since_2019,
## data = list_after_2019.country_step)
##
## Residuals:
## Min 1Q Median 3Q Max
## -155312 -4254 279 3395 189765
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 187433.62 10354.21 18.102
## Reviews -85.69 18.28 -4.689
## Monthly_Reviews -4210.84 937.01 -4.494
## Property_TypeCampsite -175695.52 12187.31 -14.416
## Property_TypeEntire condominium (condo) -155254.55 7979.33 -19.457
## Property_TypeEntire guest suite -172580.58 13310.45 -12.966
## Property_TypeEntire guesthouse -156938.24 13399.55 -11.712
## Property_TypeEntire loft -157620.54 11799.96 -13.358
## Property_TypeEntire place -152593.14 11862.39 -12.864
## Property_TypeEntire rental unit -158833.97 7994.67 -19.867
## Property_TypeEntire residential home -161667.85 8863.15 -18.240
## Property_TypeEntire serviced apartment -154620.77 8055.14 -19.195
## Property_TypeEntire townhouse -162330.65 13318.70 -12.188
## Property_TypePrivate room -171680.40 11870.51 -14.463
## Property_TypePrivate room in bed and breakfast -167769.83 11328.35 -14.810
## Property_TypePrivate room in bungalow -166950.33 10560.68 -15.809
## Property_TypePrivate room in condominium (condo) -167160.08 9868.85 -16.938
## Property_TypePrivate room in guest suite -166760.52 18033.46 -9.247
## Property_TypePrivate room in hostel -163602.33 10778.25 -15.179
## Property_TypePrivate room in loft -171014.90 11790.56 -14.504
## Property_TypePrivate room in rental unit -169044.91 9787.60 -17.271
## Property_TypePrivate room in residential home -173183.61 9823.84 -17.629
## Property_TypePrivate room in serviced apartment -164442.30 10390.40 -15.826
## Property_TypePrivate room in townhouse -170533.80 10070.03 -16.935
## Property_TypePrivate room in villa -169661.60 11300.51 -15.014
## Property_TypeRoom in aparthotel -167088.38 18016.06 -9.274
## Property_TypeRoom in boutique hotel -164189.59 9759.79 -16.823
## Property_TypeRoom in hotel -162818.84 9986.37 -16.304
## Property_TypeShared room -169009.50 11598.85 -14.571
## Property_TypeShared room in bed and breakfast -168024.94 10779.65 -15.587
## Property_TypeShared room in boutique hotel -167198.96 13110.99 -12.753
## Property_TypeShared room in hostel -169297.87 10273.78 -16.479
## Property_TypeShared room in rental unit -179732.59 11831.39 -15.191
## Property_TypeShared room in residential home -165368.85 14510.77 -11.396
## Property_TypeTent -178738.36 17365.81 -10.293
## Property_TypeTiny house -119439.31 13303.70 -8.978
## host_Superhost 3908.46 919.67 4.250
## Amenities_Wifi -15441.34 4298.87 -3.592
## hv_manual_gov 7210.26 1464.49 4.923
## no_of_vf -2361.64 361.54 -6.532
## H_Cap 8883.74 1258.71 7.058
## ln_Capacity_x_House_ind -24416.98 6518.16 -3.746
## reviews_since_2019 1176.44 38.33 30.696
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## Reviews 3.03e-06 ***
## Monthly_Reviews 7.60e-06 ***
## Property_TypeCampsite < 2e-16 ***
## Property_TypeEntire condominium (condo) < 2e-16 ***
## Property_TypeEntire guest suite < 2e-16 ***
## Property_TypeEntire guesthouse < 2e-16 ***
## Property_TypeEntire loft < 2e-16 ***
## Property_TypeEntire place < 2e-16 ***
## Property_TypeEntire rental unit < 2e-16 ***
## Property_TypeEntire residential home < 2e-16 ***
## Property_TypeEntire serviced apartment < 2e-16 ***
## Property_TypeEntire townhouse < 2e-16 ***
## Property_TypePrivate room < 2e-16 ***
## Property_TypePrivate room in bed and breakfast < 2e-16 ***
## Property_TypePrivate room in bungalow < 2e-16 ***
## Property_TypePrivate room in condominium (condo) < 2e-16 ***
## Property_TypePrivate room in guest suite < 2e-16 ***
## Property_TypePrivate room in hostel < 2e-16 ***
## Property_TypePrivate room in loft < 2e-16 ***
## Property_TypePrivate room in rental unit < 2e-16 ***
## Property_TypePrivate room in residential home < 2e-16 ***
## Property_TypePrivate room in serviced apartment < 2e-16 ***
## Property_TypePrivate room in townhouse < 2e-16 ***
## Property_TypePrivate room in villa < 2e-16 ***
## Property_TypeRoom in aparthotel < 2e-16 ***
## Property_TypeRoom in boutique hotel < 2e-16 ***
## Property_TypeRoom in hotel < 2e-16 ***
## Property_TypeShared room < 2e-16 ***
## Property_TypeShared room in bed and breakfast < 2e-16 ***
## Property_TypeShared room in boutique hotel < 2e-16 ***
## Property_TypeShared room in hostel < 2e-16 ***
## Property_TypeShared room in rental unit < 2e-16 ***
## Property_TypeShared room in residential home < 2e-16 ***
## Property_TypeTent < 2e-16 ***
## Property_TypeTiny house < 2e-16 ***
## host_Superhost 2.29e-05 ***
## Amenities_Wifi 0.000340 ***
## hv_manual_gov 9.57e-07 ***
## no_of_vf 9.20e-11 ***
## H_Cap 2.71e-12 ***
## ln_Capacity_x_House_ind 0.000187 ***
## reviews_since_2019 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 15170 on 1330 degrees of freedom
## Multiple R-squared: 0.7785, Adjusted R-squared: 0.7715
## F-statistic: 111.3 on 42 and 1330 DF, p-value: < 2.2e-16
summary(stepwise_regression_model(list_after_2019.hkg_step))
##
## Call:
## lm(formula = earnings_since_2019 ~ Reviews + neighbourhood_cleansed +
## hv_manual_jumio + hv_manual_gov + Days_since_last_review +
## H_Cap + reviews_since_2019, data = list_after_2019.country_step)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4231374 -56096 -8498 29868 8234314
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -50166.34 26759.51 -1.875 0.060986 .
## Reviews -691.75 183.45 -3.771 0.000168 ***
## neighbourhood_cleansedEastern -1312.14 46076.05 -0.028 0.977284
## neighbourhood_cleansedIslands -6024.28 31442.03 -0.192 0.848076
## neighbourhood_cleansedKowloon City -6505.92 42112.57 -0.154 0.877241
## neighbourhood_cleansedKwai Tsing 26204.03 225261.95 0.116 0.907406
## neighbourhood_cleansedKwun Tong -31013.45 184191.07 -0.168 0.866305
## neighbourhood_cleansedNorth -29173.70 46986.46 -0.621 0.534743
## neighbourhood_cleansedSai Kung 122437.57 55735.96 2.197 0.028159 *
## neighbourhood_cleansedSha Tin 10976.65 75705.09 0.145 0.884732
## neighbourhood_cleansedSham Shui Po -30313.31 69380.60 -0.437 0.662224
## neighbourhood_cleansedSouthern 35842.82 68099.71 0.526 0.598721
## neighbourhood_cleansedTai Po 67172.51 80117.87 0.838 0.401900
## neighbourhood_cleansedTsuen Wan 4233457.53 184206.47 22.982 < 2e-16 ***
## neighbourhood_cleansedTuen Mun 47598.86 102477.90 0.464 0.642358
## neighbourhood_cleansedWan Chai -34554.08 29100.74 -1.187 0.235220
## neighbourhood_cleansedWong Tai Sin -2356.44 317666.64 -0.007 0.994082
## neighbourhood_cleansedYau Tsim Mong -22565.15 24172.81 -0.933 0.350684
## neighbourhood_cleansedYuen Long -42325.19 46028.10 -0.920 0.357924
## hv_manual_jumio 160848.12 37594.72 4.278 1.98e-05 ***
## hv_manual_gov -184078.09 37774.85 -4.873 1.19e-06 ***
## Days_since_last_review 86.50 23.13 3.739 0.000190 ***
## H_Cap 18941.67 2761.51 6.859 9.34e-12 ***
## reviews_since_2019 7585.89 396.22 19.146 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 316900 on 1898 degrees of freedom
## Multiple R-squared: 0.4187, Adjusted R-squared: 0.4116
## F-statistic: 59.43 on 23 and 1898 DF, p-value: < 2.2e-16
summary(stepwise_regression_model(list_after_2019.nrt_step))
##
## Call:
## lm(formula = earnings_since_2019 ~ Reviews + Capacity + Monthly_Reviews +
## hv_reviews + Days_since_last_review + Capacity_Sqr + ln_Capacity +
## reviews_since_2019, data = list_after_2019.country_step)
##
## Residuals:
## Min 1Q Median 3Q Max
## -31027530 -1465859 -33262 900095 240476057
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1429357.3 965276.6 1.481 0.13871
## Reviews -12944.8 2960.5 -4.373 1.25e-05 ***
## Capacity 2210651.3 461045.1 4.795 1.66e-06 ***
## Monthly_Reviews 353356.1 98155.3 3.600 0.00032 ***
## hv_reviews -750024.7 177135.9 -4.234 2.32e-05 ***
## Days_since_last_review 1660.3 314.3 5.283 1.31e-07 ***
## Capacity_Sqr -48167.0 16128.4 -2.986 0.00283 **
## ln_Capacity -6679136.8 1592238.3 -4.195 2.76e-05 ***
## reviews_since_2019 130606.4 5078.2 25.719 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6606000 on 7063 degrees of freedom
## Multiple R-squared: 0.2914, Adjusted R-squared: 0.2906
## F-statistic: 363.1 on 8 and 7063 DF, p-value: < 2.2e-16
summary(stepwise_regression_model(list_after_2019.tpe_step))
##
## Call:
## lm(formula = earnings_since_2019 ~ hv_phone + H_Cap + ln_Capacity_x_House_ind +
## reviews_since_2019, data = list_after_2019.country_step)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3341907 -146373 11476 81599 33174950
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2925762.9 433853.5 6.744 1.98e-11 ***
## hv_phone -2966452.6 432318.3 -6.862 8.88e-12 ***
## H_Cap 191366.5 17877.8 10.704 < 2e-16 ***
## ln_Capacity_x_House_ind -445223.8 68206.3 -6.528 8.32e-11 ***
## reviews_since_2019 14139.8 839.1 16.851 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1056000 on 2140 degrees of freedom
## Multiple R-squared: 0.2118, Adjusted R-squared: 0.2103
## F-statistic: 143.7 on 4 and 2140 DF, p-value: < 2.2e-16
# Putting forward stepwise regression in comments in case we need to use
# forward_regression = step(minmod, scope = list(lower = minmod, upper = fullmod),direction = "forward", k=log(nobs(fullmod)), trace=F)
# summary(forward_regression)
# library(leaps)
backwardstep_leaps_sin <- regsubsets(earnings_since_2019~., data = list_after_2019.sin_step, nvmax = 5,method = "backward")
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 14 linear dependencies found
## Reordering variables and trying again:
backwardstep_leaps_hkg <- regsubsets(earnings_since_2019~., data = list_after_2019.hkg_step, nvmax = 5,method = "backward")
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 12 linear dependencies found
## Reordering variables and trying again:
backwardstep_leaps_nrt <- regsubsets(earnings_since_2019~., data = list_after_2019.nrt_step, nvmax = 5,method = "backward")
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 12 linear dependencies found
## Reordering variables and trying again:
backwardstep_leaps_tpe <- regsubsets(earnings_since_2019~., data = list_after_2019.tpe_step, nvmax = 5,method = "backward")
## Warning in leaps.setup(x, y, wt = wt, nbest = nbest, nvmax = nvmax, force.in =
## force.in, : 12 linear dependencies found
## Reordering variables and trying again:
summary(backwardstep_leaps_sin)$which
## (Intercept) Reviews Beds Capacity Monthly_Reviews Property_TypeCampsite
## 1 TRUE FALSE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE FALSE
## Property_TypeEntire condominium (condo) Property_TypeEntire guest suite
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 TRUE FALSE
## 6 TRUE FALSE
## Property_TypeEntire guesthouse Property_TypeEntire loft
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire place Property_TypeEntire rental unit
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE TRUE
## 4 FALSE TRUE
## 5 FALSE TRUE
## 6 FALSE TRUE
## Property_TypeEntire residential home Property_TypeEntire serviced apartment
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE TRUE
## Property_TypeEntire townhouse Property_TypePrivate room
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in bungalow
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in condominium (condo)
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in guest suite Property_TypePrivate room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in loft Property_TypePrivate room in rental unit
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 TRUE
## 5 TRUE
## 6 TRUE
## Property_TypePrivate room in serviced apartment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in townhouse Property_TypePrivate room in villa
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeRoom in aparthotel Property_TypeRoom in boutique hotel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeRoom in hotel Property_TypeShared room
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in boutique hotel Property_TypeShared room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in rental unit
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in residential home Property_TypeTent
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeTiny house Room_TypePrivate room Room_TypeEntire home/apt Rating
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## neighbourhood_cleansedBedok neighbourhood_cleansedBishan
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedBukit Batok neighbourhood_cleansedBukit Merah
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedBukit Panjang neighbourhood_cleansedBukit Timah
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedCentral Water Catchment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## neighbourhood_cleansedChoa Chu Kang neighbourhood_cleansedClementi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedDowntown Core neighbourhood_cleansedGeylang
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedHougang neighbourhood_cleansedJurong East
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedJurong West neighbourhood_cleansedKallang
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedLim Chu Kang neighbourhood_cleansedMandai
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedMarina South neighbourhood_cleansedMarine Parade
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedMuseum neighbourhood_cleansedNewton
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedNovena neighbourhood_cleansedOrchard
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedOutram neighbourhood_cleansedPasir Ris
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedPunggol neighbourhood_cleansedQueenstown
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedRiver Valley neighbourhood_cleansedRochor
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedSembawang neighbourhood_cleansedSengkang
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedSerangoon neighbourhood_cleansedSingapore River
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedSouthern Islands neighbourhood_cleansedTampines
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedTanglin neighbourhood_cleansedToa Payoh
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedWoodlands neighbourhood_cleansedYishun
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timeN/A host_response_timewithin a day
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timewithin a few hours host_response_timewithin an hour
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_acceptance_rate host_Superhost no_of_am Amenities_Wifi Amenities_Shampoo
## 1 FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE
## Amenities_Kitchen Amenities_Long_Term Amenities_Washer Amenities_HairDryer
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_HotWater Amenities_TV Amenities_AC hv_email hv_phone hv_facebook
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## hv_reviews hv_manual_offline hv_manual_jumio hv_manual_off_gov hv_manual_gov
## 1 FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE
## hv_manual_work_email no_of_vf Days_since_last_review Capacity_Sqr Beds_Sqr
## 1 FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE
## ln_Beds ln_Capacity ln_Rating Shared_ind House_ind Private_ind
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## Capacity_x_Shared_ind H_Cap P_Cap ln_Capacity_x_Shared_ind
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE TRUE FALSE FALSE
## 3 FALSE TRUE FALSE FALSE
## 4 FALSE TRUE FALSE FALSE
## 5 FALSE TRUE FALSE FALSE
## 6 FALSE TRUE FALSE FALSE
## ln_Capacity_x_House_ind ln_Capacity_x_Private_ind reviews_since_2019
## 1 FALSE FALSE TRUE
## 2 FALSE FALSE TRUE
## 3 FALSE FALSE TRUE
## 4 FALSE FALSE TRUE
## 5 FALSE FALSE TRUE
## 6 FALSE FALSE TRUE
## bookings_since_2019 nb_group_1 nb_group_2 nb_group_3 nb_group_4 nb_group_5
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
summary(backwardstep_leaps_hkg)$which
## (Intercept) Reviews Beds Capacity Monthly_Reviews Property_TypeCastle
## 1 TRUE FALSE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE TRUE FALSE FALSE FALSE
## 5 TRUE TRUE TRUE FALSE FALSE FALSE
## 6 TRUE TRUE TRUE FALSE FALSE FALSE
## Property_TypeEarth house Property_TypeEntire bungalow
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire chalet Property_TypeEntire condominium (condo)
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire cottage Property_TypeEntire guest suite
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire guesthouse Property_TypeEntire loft
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire place Property_TypeEntire rental unit
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire residential home Property_TypeEntire serviced apartment
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire townhouse Property_TypeEntire villa
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeFarm stay Property_TypeHouseboat Property_TypeHut
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## Property_TypePension Property_TypePrivate room
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in boat Property_TypePrivate room in bungalow
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in condominium (condo)
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in cottage Property_TypePrivate room in guest suite
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in guesthouse Property_TypePrivate room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in loft Property_TypePrivate room in nature lodge
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in rental unit
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in serviced apartment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in tiny house
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in townhouse Property_TypePrivate room in villa
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 TRUE FALSE
## 4 TRUE FALSE
## 5 TRUE FALSE
## 6 TRUE FALSE
## Property_TypeRoom in aparthotel Property_TypeRoom in boutique hotel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeRoom in hotel Property_TypeShared room
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in boat Property_TypeShared room in boutique hotel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in condominium (condo)
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in guest suite Property_TypeShared room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in rental unit
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in serviced apartment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in townhouse Property_TypeTiny house
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Room_TypePrivate room Room_TypeEntire home/apt Rating
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## neighbourhood_cleansedEastern neighbourhood_cleansedIslands
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedKowloon City neighbourhood_cleansedKwai Tsing
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedKwun Tong neighbourhood_cleansedNorth
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedSai Kung neighbourhood_cleansedSha Tin
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedSham Shui Po neighbourhood_cleansedSouthern
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedTai Po neighbourhood_cleansedTsuen Wan
## 1 FALSE FALSE
## 2 FALSE TRUE
## 3 FALSE TRUE
## 4 FALSE TRUE
## 5 FALSE TRUE
## 6 FALSE TRUE
## neighbourhood_cleansedTuen Mun neighbourhood_cleansedWan Chai
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedWong Tai Sin neighbourhood_cleansedYau Tsim Mong
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedYuen Long host_response_timeN/A
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timewithin a day host_response_timewithin a few hours
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timewithin an hour host_acceptance_rate host_Superhost no_of_am
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_Wifi Amenities_Shampoo Amenities_Kitchen Amenities_Long_Term
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_Washer Amenities_HairDryer Amenities_HotWater Amenities_TV
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_AC hv_email hv_phone hv_facebook hv_reviews hv_manual_offline
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## hv_manual_jumio hv_manual_off_gov hv_manual_gov hv_manual_work_email no_of_vf
## 1 FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE TRUE FALSE FALSE
## Days_since_last_review Capacity_Sqr Beds_Sqr ln_Beds ln_Capacity ln_Rating
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## Shared_ind House_ind Private_ind Capacity_x_Shared_ind H_Cap P_Cap
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## ln_Capacity_x_Shared_ind ln_Capacity_x_House_ind ln_Capacity_x_Private_ind
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## reviews_since_2019 bookings_since_2019 nb_group_1 nb_group_2 nb_group_3
## 1 TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE
## nb_group_4 nb_group_5
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
summary(backwardstep_leaps_nrt)$which
## (Intercept) Reviews Beds Capacity Monthly_Reviews
## 1 TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE TRUE FALSE
## 3 TRUE FALSE FALSE TRUE FALSE
## 4 TRUE FALSE FALSE TRUE FALSE
## 5 TRUE FALSE FALSE TRUE FALSE
## 6 TRUE FALSE FALSE TRUE FALSE
## Property_TypeCasa particular Property_TypeEarth house
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire bungalow Property_TypeEntire cabin
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire condominium (condo) Property_TypeEntire guest suite
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire guesthouse Property_TypeEntire hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire loft Property_TypeEntire place
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire rental unit Property_TypeEntire residential home
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire serviced apartment Property_TypeEntire townhouse
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire vacation home Property_TypeEntire villa Property_TypeHut
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE TRUE FALSE
## 6 FALSE TRUE FALSE
## Property_TypePrivate room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in condominium (condo)
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in guest suite
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in guesthouse Property_TypePrivate room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in hut Property_TypePrivate room in rental unit
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in resort Property_TypePrivate room in ryokan
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in serviced apartment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in tiny house
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in townhouse Property_TypePrivate room in villa
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeRoom in aparthotel Property_TypeRoom in boutique hotel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeRoom in hotel Property_TypeShared room
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in aparthotel
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in boutique hotel Property_TypeShared room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in hotel Property_TypeShared room in hut
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in rental unit
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in ryokan Property_TypeTiny house
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeTreehouse Room_TypePrivate room Room_TypeEntire home/apt Rating
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE
## neighbourhood_cleansedAkiruno Shi neighbourhood_cleansedAkishima Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedArakawa Ku neighbourhood_cleansedBunkyo Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedChiyoda Ku neighbourhood_cleansedChofu Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedChuo Ku neighbourhood_cleansedEdogawa Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedFuchu Shi neighbourhood_cleansedHachioji Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedHamura Shi neighbourhood_cleansedHigashikurume Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedHigashimurayama Shi neighbourhood_cleansedHino Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedItabashi Ku neighbourhood_cleansedKatsushika Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedKita Ku neighbourhood_cleansedKodaira Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedKoganei Shi neighbourhood_cleansedKokubunji Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedKomae Shi neighbourhood_cleansedKoto Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedKunitachi Shi neighbourhood_cleansedMachida Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedMeguro Ku neighbourhood_cleansedMinato Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedMitaka Shi neighbourhood_cleansedMusashimurayama Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedMusashino Shi neighbourhood_cleansedNakano Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedNerima Ku neighbourhood_cleansedNishitokyo Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedOkutama Machi neighbourhood_cleansedOme Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedOta Ku neighbourhood_cleansedSetagaya Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedShibuya Ku neighbourhood_cleansedShinagawa Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedShinjuku Ku neighbourhood_cleansedSuginami Ku
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedSumida Ku neighbourhood_cleansedTachikawa Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedTaito Ku neighbourhood_cleansedTama Shi
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansedToshima Ku host_response_timeN/A
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timewithin a day host_response_timewithin a few hours
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timewithin an hour host_acceptance_rate host_Superhost no_of_am
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_Wifi Amenities_Shampoo Amenities_Kitchen Amenities_Long_Term
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_Washer Amenities_HairDryer Amenities_HotWater Amenities_TV
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_AC hv_email hv_phone hv_facebook hv_reviews hv_manual_offline
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE TRUE FALSE
## 5 FALSE FALSE FALSE FALSE TRUE FALSE
## 6 FALSE FALSE FALSE FALSE TRUE FALSE
## hv_manual_jumio hv_manual_off_gov hv_manual_gov hv_manual_work_email no_of_vf
## 1 FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE
## Days_since_last_review Capacity_Sqr Beds_Sqr ln_Beds ln_Capacity ln_Rating
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE FALSE
## Shared_ind House_ind Private_ind Capacity_x_Shared_ind H_Cap P_Cap
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## ln_Capacity_x_Shared_ind ln_Capacity_x_House_ind ln_Capacity_x_Private_ind
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## reviews_since_2019 bookings_since_2019 nb_group_1 nb_group_2 nb_group_3
## 1 TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE
## nb_group_4 nb_group_5
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
summary(backwardstep_leaps_tpe)$which
## (Intercept) Reviews Beds Capacity Monthly_Reviews
## 1 TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE
## Property_TypeEntire condominium (condo) Property_TypeEntire guest suite
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire guesthouse Property_TypeEntire loft
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire place Property_TypeEntire rental unit
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire residential home Property_TypeEntire serviced apartment
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeEntire townhouse Property_TypeEntire villa Property_TypeMinsu
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## Property_TypePrivate room Property_TypePrivate room in bed and breakfast
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in bungalow
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in casa particular
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in condominium (condo)
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in guest suite
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in guesthouse Property_TypePrivate room in hostel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in loft Property_TypePrivate room in minsu
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypePrivate room in rental unit
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in serviced apartment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypePrivate room in townhouse Property_TypeRoom in aparthotel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeRoom in boutique hotel Property_TypeRoom in hotel
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in bed and breakfast
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in boutique hotel Property_TypeShared room in cave
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in condominium (condo)
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in hostel Property_TypeShared room in loft
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Property_TypeShared room in rental unit
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in residential home
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in serviced apartment
## 1 FALSE
## 2 FALSE
## 3 FALSE
## 4 FALSE
## 5 FALSE
## 6 FALSE
## Property_TypeShared room in tent Property_TypeTiny house
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## Room_TypePrivate room Room_TypeEntire home/apt Rating
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE FALSE FALSE
## 5 FALSE FALSE FALSE
## 6 FALSE FALSE FALSE
## neighbourhood_cleansed中正區 neighbourhood_cleansed信義區
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansed內湖區 neighbourhood_cleansed北投區
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE TRUE
## 6 FALSE TRUE
## neighbourhood_cleansed南港區 neighbourhood_cleansed士林區
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansed大同區 neighbourhood_cleansed大安區
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansed文山區 neighbourhood_cleansed松山區
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## neighbourhood_cleansed萬華區 host_response_timeN/A
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 TRUE FALSE
## host_response_timewithin a day host_response_timewithin a few hours
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
## host_response_timewithin an hour host_acceptance_rate host_Superhost no_of_am
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_Wifi Amenities_Shampoo Amenities_Kitchen Amenities_Long_Term
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_Washer Amenities_HairDryer Amenities_HotWater Amenities_TV
## 1 FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE
## Amenities_AC hv_email hv_phone hv_facebook hv_reviews hv_manual_offline
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE TRUE FALSE FALSE FALSE
## 4 FALSE FALSE TRUE FALSE FALSE FALSE
## 5 FALSE FALSE TRUE FALSE FALSE FALSE
## 6 FALSE FALSE TRUE FALSE FALSE FALSE
## hv_manual_jumio hv_manual_off_gov hv_manual_gov hv_manual_work_email no_of_vf
## 1 FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE
## Days_since_last_review Capacity_Sqr Beds_Sqr ln_Beds ln_Capacity ln_Rating
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE FALSE FALSE
## 3 FALSE FALSE FALSE FALSE FALSE FALSE
## 4 FALSE FALSE FALSE FALSE FALSE FALSE
## 5 FALSE FALSE FALSE FALSE FALSE FALSE
## 6 FALSE FALSE FALSE FALSE FALSE FALSE
## Shared_ind House_ind Private_ind Capacity_x_Shared_ind H_Cap P_Cap
## 1 FALSE FALSE FALSE FALSE FALSE FALSE
## 2 FALSE FALSE FALSE FALSE TRUE FALSE
## 3 FALSE FALSE FALSE FALSE TRUE FALSE
## 4 FALSE FALSE FALSE FALSE TRUE FALSE
## 5 FALSE FALSE FALSE FALSE TRUE FALSE
## 6 FALSE FALSE FALSE FALSE TRUE FALSE
## ln_Capacity_x_Shared_ind ln_Capacity_x_House_ind ln_Capacity_x_Private_ind
## 1 FALSE FALSE FALSE
## 2 FALSE FALSE FALSE
## 3 FALSE FALSE FALSE
## 4 FALSE TRUE FALSE
## 5 FALSE TRUE FALSE
## 6 FALSE TRUE FALSE
## reviews_since_2019 bookings_since_2019 nb_group_1 nb_group_2 nb_group_3
## 1 TRUE FALSE FALSE FALSE FALSE
## 2 TRUE FALSE FALSE FALSE FALSE
## 3 TRUE FALSE FALSE FALSE FALSE
## 4 TRUE FALSE FALSE FALSE FALSE
## 5 TRUE FALSE FALSE FALSE FALSE
## 6 TRUE FALSE FALSE FALSE FALSE
## nb_group_4 nb_group_5
## 1 FALSE FALSE
## 2 FALSE FALSE
## 3 FALSE FALSE
## 4 FALSE FALSE
## 5 FALSE FALSE
## 6 FALSE FALSE
#5 Lasso Regression
We now try Lasso Regression for our model.
set.seed(100)
# library(glmnet)
lasso_cv_sin= cv.glmnet(as.matrix(list_after_2019.sin_clean[,!names(list_after_2019.sin_clean) %in% c("earnings_since_2019")]),list_after_2019.sin_clean[,c("earnings_since_2019")],family="gaussian",alpha=1, nfolds=10)
lasso_cv_hkg= cv.glmnet(as.matrix(list_after_2019.hkg_clean[,!names(list_after_2019.hkg_clean) %in% c("earnings_since_2019")]),list_after_2019.hkg_clean[,c("earnings_since_2019")],family="gaussian",alpha=1, nfolds=10)
lasso_cv_nrt= cv.glmnet(as.matrix(list_after_2019.nrt_clean[,!names(list_after_2019.nrt_clean) %in% c("earnings_since_2019")]),list_after_2019.nrt_clean[,c("earnings_since_2019")],family="gaussian",alpha=1, nfolds=10)
lasso_cv_tpe= cv.glmnet(as.matrix(list_after_2019.tpe_clean[,!names(list_after_2019.tpe_clean) %in% c("earnings_since_2019")]),list_after_2019.tpe_clean[,c("earnings_since_2019")],family="gaussian",alpha=1, nfolds=10)
lasso_coef.sin <- coef(lasso_cv_sin,s=lasso_cv_sin$lambda.min)
lasso_coef.hkg <- coef(lasso_cv_hkg,s=lasso_cv_hkg$lambda.min)
lasso_coef.nrt <- coef(lasso_cv_nrt,s=lasso_cv_nrt$lambda.min)
lasso_coef.tpe <- coef(lasso_cv_tpe,s=lasso_cv_tpe$lambda.min)
We now check on the top/ bottom n coefficients scaled by their respective values.
returnDf <- function (model, input_coef)
{
feature_names <- all.vars(model$terms)
sze = length(input_coef)
lasso_coef_df <- data.frame(features = input_coef@Dimnames[[1]][1:sze], coefs = round(input_coef[1:sze],2)) %>% filter (coefs != 0 )
return (lasso_coef_df)
}
lasso_coef_df.sin <- returnDf(lasso_cv_sin, lasso_coef.sin) %>% rename (sin_coefs = coefs)
lasso_coef_df.hkg <- returnDf(lasso_cv_hkg, lasso_coef.hkg) %>% rename (hkg_coefs = coefs)
lasso_coef_df.nrt <- returnDf(lasso_cv_nrt, lasso_coef.nrt) %>% rename (nrt_coefs = coefs)
lasso_coef_df.tpe <- returnDf(lasso_cv_tpe, lasso_coef.tpe) %>% rename (tpe_coefs = coefs)
lasso_coef_df <- full_join(lasso_coef_df.nrt, lasso_coef_df.tpe, on="features")
## Joining, by = "features"
lasso_coef_df <- full_join(lasso_coef_df, lasso_coef_df.sin, on="features")
## Joining, by = "features"
lasso_coef_df <- full_join(lasso_coef_df, lasso_coef_df.hkg, on="features")
## Joining, by = "features"
lasso_coef_normalised.sin <- returnDf(lasso_cv_sin, lasso_coef.sin) %>% mutate_each_(list(~scale(.) %>% as.vector), vars = "coefs")
## Warning: `mutate_each_()` was deprecated in dplyr 0.7.0.
## Please use `across()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.
lasso_coef_normalised.hkg <- returnDf(lasso_cv_hkg, lasso_coef.hkg) %>% mutate_each_(list(~scale(.) %>% as.vector), vars = "coefs")
lasso_coef_normalised.nrt <- returnDf(lasso_cv_tpe, lasso_coef.tpe) %>% mutate_each_(list(~scale(.) %>% as.vector), vars = "coefs")
lasso_coef_normalised.tpe <- returnDf(lasso_cv_nrt, lasso_coef.nrt) %>% mutate_each_(list(~scale(.) %>% as.vector), vars = "coefs")
returnTopBtmCoefs <- function (lasso_coef_normalised, topn)
{
return (lasso_coef_normalised %>% arrange(desc(coefs)) %>% top_n(topn) %>% mutate(coefs = round(coefs,2)))
}
paintGraphCoefs <- function(lasso_coef_top5, topn, city)
{
top5_listings.fig <- plot_ly(
x = lasso_coef_top5$features,
y = lasso_coef_top5$coefs,
type = "bar",
text = lasso_coef_top5$coefs
)
top5_listings.fig <- top5_listings.fig %>% layout(title =paste("Top", topn," Features for", city), yaxis = list(title="Feature Weight (relative)"))
top5_listings.fig
}
lasso_coef_top5.sin <- returnTopBtmCoefs(lasso_coef_normalised.sin, 5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_top5.sin, 5, "Singapore")
lasso_coef_top5.nrt <- returnTopBtmCoefs(lasso_coef_normalised.nrt, 5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_top5.nrt, 5, "Tokyo")
lasso_coef_top5.tpe <- returnTopBtmCoefs(lasso_coef_normalised.tpe, 5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_top5.tpe, 5, "Taipei")
lasso_coef_top5.hkg <- returnTopBtmCoefs(lasso_coef_normalised.hkg, 5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_top5.hkg, 5, "Hong Kong")
lasso_coef_btm5.sin <- returnTopBtmCoefs(lasso_coef_normalised.sin, -5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_btm5.sin, -5, "Singapore")
lasso_coef_btm5.nrt <- returnTopBtmCoefs(lasso_coef_normalised.nrt, -5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_btm5.sin, -5, "Tokyo")
lasso_coef_btm5.tpe <- returnTopBtmCoefs(lasso_coef_normalised.tpe, -5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_btm5.tpe, -5, "Taipei")
lasso_coef_btm5.hkg <- returnTopBtmCoefs(lasso_coef_normalised.hkg, -5)
## Selecting by coefs
paintGraphCoefs(lasso_coef_btm5.hkg, -5, "Hong Kong")
And in tabular form:
lasso_coef_top5.sin
## features coefs
## 1 Property_Type_Boat 5.42
## 2 Property_Type_Tiny house 0.49
## 3 (Intercept) -0.04
## 4 H_Cap -0.08
## 5 hv_manual_gov -0.10
lasso_coef_top5.nrt
## features coefs
## 1 (Intercept) 4.15
## 2 nb_group_1 0.43
## 3 Property_Type_Entire residential home 0.28
## 4 hv_manual_off_gov 0.27
## 5 H_Cap 0.16
lasso_coef_top5.tpe
## features coefs
## 1 Property_Type_Treehouse 7.16
## 2 Property_Type_Private room in resort 0.61
## 3 Property_Type_Entire villa 0.42
## 4 Property_Type_Entire loft 0.05
## 5 host_acceptance_rate 0.04
lasso_coef_top5.hkg
## features coefs
## 1 Property_Type_Private room in townhouse 1.42
## 2 nb_group_1 -0.01
## 3 (Intercept) -0.63
## 4 reviews_since_2019 -0.77
lasso_coef_btm5.sin
## features coefs
## 1 Property_Type_Private room in residential home -0.31
## 2 Amenities_AC -0.31
## 3 nb_group_2 -0.32
## 4 Property_Type_Campsite -0.32
## 5 Amenities_Wifi -0.40
lasso_coef_btm5.nrt
## features coefs
## 1 Property_Type_Entire condominium (condo) -0.14
## 2 Property_Type_Entire bungalow -0.22
## 3 House_ind -0.23
## 4 Property_Type_Entire villa -1.22
## 5 hv_phone -4.32
lasso_coef_btm5.tpe
## features coefs
## 1 hv_reviews -0.22
## 2 Property_Type_Private room in guesthouse -0.23
## 3 Property_Type_Earth house -0.41
## 4 Property_Type_Entire vacation home -0.49
## 5 (Intercept) -0.84
lasso_coef_btm5.hkg
## features coefs
## 1 Property_Type_Private room in townhouse 1.42
## 2 nb_group_1 -0.01
## 3 (Intercept) -0.63
## 4 reviews_since_2019 -0.77
A quick check on the mean-squared errors for the respective \(\lambda\) values.
plot(lasso_cv_sin)
plot(lasso_cv_hkg)
plot(lasso_cv_nrt)
plot(lasso_cv_tpe)
This concludes our notebook.